Stable Feature Selection: Theory and Algorithms
نویسندگان
چکیده
Feature selection plays an important role in knowledge discovery from many application domains with high-dimensional data. Many feature selection algorithms have been developed and shown successful at improving predictive accuracy of learning models while reducing feature space dimensionality and model complexity. Besides high accuracy, the stability of feature selection the insensitivity of the result of a feature selection algorithm to variations to the training set, is another important yet under-addressed issue for feature selection. The stability issue has become increasingly critical in application domains where feature selection is used as a knowledge discovery tool to identify important features for explaining the observed phenomena, such as biomarker identification in cancer diagnosis. In this dissertation, we present a theoretical framework about the relationship between the stability and the accuracy of feature selection based on a formal biasvariance decomposition of feature selection error. The framework also reveals the connection between stability and sample size and suggests a variance reduction approach for improving the stability of feature selection algorithms under small sample size. Following the theoretical framework, we also develop an empirical variance reduction framework and margin based instance weighting algorithms under this framework. Moreover, our extensive experimental study verifies the theoretical framework and the empirical framework based on both synthetic data sets and real-world microarray data sets. Our results show that the empirical framework is effective at reducing the variance and improving the stability of two representative feature selection algorithms, SVM-RFE and ReliefF, while maintaining comparable predictive accuracy based on the selected features. The proposed instance weighting framework is also shown to be more effective and efficient than the existing ensemble framework at improving the subset stability of the feature selection algorithms under study.
منابع مشابه
Online Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features
Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...
متن کاملSequential and Mixed Genetic Algorithm and Learning Automata (SGALA, MGALA) for Feature Selection in QSAR
Feature selection is of great importance in Quantitative Structure-Activity Relationship (QSAR) analysis. This problem has been solved using some meta-heuristic algorithms such as: GA, PSO, ACO, SA and so on. In this work two novel hybrid meta-heuristic algorithms i.e. Sequential GA and LA (SGALA) and Mixed GA and LA (MGALA), which are based on Genetic algorithm and learning automata for QSAR f...
متن کاملSequential and Mixed Genetic Algorithm and Learning Automata (SGALA, MGALA) for Feature Selection in QSAR
Feature selection is of great importance in Quantitative Structure-Activity Relationship (QSAR) analysis. This problem has been solved using some meta-heuristic algorithms such as: GA, PSO, ACO, SA and so on. In this work two novel hybrid meta-heuristic algorithms i.e. Sequential GA and LA (SGALA) and Mixed GA and LA (MGALA), which are based on Genetic algorithm and learning automata for QSAR f...
متن کاملImproving of Feature Selection in Speech Emotion Recognition Based-on Hybrid Evolutionary Algorithms
One of the important issues in speech emotion recognizing is selecting of appropriate feature sets in order to improve the detection rate and classification accuracy. In last studies researchers tried to select the appropriate features for classification by using the selecting and reducing the space of features methods, such as the Fisher and PCA. In this research, a hybrid evolutionary algorit...
متن کاملA Classification Method for E-mail Spam Using a Hybrid Approach for Feature Selection Optimization
Spam is an unwanted email that is harmful to communications around the world. Spam leads to a growing problem in a personal email, so it would be essential to detect it. Machine learning is very useful to solve this problem as it shows good results in order to learn all the requisite patterns for classification due to its adaptive existence. Nonetheless, in spam detection, there are a large num...
متن کاملA New Hybrid Method for Improving the Performance of Myocardial Infarction Prediction
Abstract Introduction: Myocardial Infarction, also known as heart attack, normally occurs due to such causes as smoking, family history, diabetes, and so on. It is recognized as one of the leading causes of death in the world. Therefore, the present study aimed to evaluate the performance of classification models in order to predict Myocardial Infarction, using a feature selection method tha...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012